Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance.
translated by 谷歌翻译
Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art.
translated by 谷歌翻译
The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert behavior can lead to poor results when given sub-optimal experts. We compare several common approaches for skill transfer on multiple domains including changes in task and system dynamics. We identify how existing methods can fail and introduce an alternative approach to mitigate these problems. Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience. This conceptual split enables rapid adaptation and thus efficient data collection but without constraining the final solution.It significantly outperforms many classical methods across a suite of evaluation tasks and we use a broad set of ablations to highlight the importance of differentc omponents of our method.
translated by 谷歌翻译
实现自动化车辆和外部服务器,智能基础设施和其他道路使用者之间的安全可靠的高带宽低度连通性是使全自动驾驶成为可能的核心步骤。允许这种连接性的数据接口的可用性有可能区分人造代理在连接,合作和自动化的移动性系统中的功能与不具有此类接口的人类操作员的能力。连接的代理可以例如共享数据以构建集体环境模型,计划集体行为,并从集中组合的共享数据集体学习。本文提出了多种解决方案,允许连接的实体交换数据。特别是,我们提出了一个新的通用通信界面,该界面使用消息排队遥测传输(MQTT)协议连接运行机器人操作系统(ROS)的代理。我们的工作整合了以各种关键绩效指标的形式评估连接质量的方法。我们比较了各种方法,这些方法提供了5G网络中Edge-Cloud LiDAR对象检测的示例性用例所需的连接性。我们表明,基于车辆的传感器测量值的可用性与从边缘云中接收到相应的对象列表之间的平均延迟低于87毫秒。所有实施的解决方案均可为开源并免费使用。源代码可在https://github.com/ika-rwth-aachen/ros-v2x-benchmarking-suite上获得。
translated by 谷歌翻译
我们研究了复杂几何物体的机器人堆叠问题。我们提出了一个挑战和多样化的这些物体,这些物体被精心设计,以便要求超出简单的“拾取”解决方案之外的策略。我们的方法是加强学习(RL)方法与基于视觉的互动政策蒸馏和模拟到现实转移相结合。我们的学习政策可以有效地处理现实世界中的多个对象组合,并展示各种各样的堆叠技能。在一个大型的实验研究中,我们调查在模拟中学习这种基于视觉的基于视觉的代理的选择,以及对真实机器人的最佳转移产生了什么影响。然后,我们利用这些策略收集的数据并通过离线RL改善它们。我们工作的视频和博客文章作为补充材料提供。
translated by 谷歌翻译